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Abstract 

The  use  of  model  checking  for  validation  requires  that 
models  of  the  underlying  system  be  created.  Creating  such 
models  is  both  difficult  and  error  prone  and  as  a  result,  ver¬ 
ification  is  rarely  used  despite  its  advantages.  In  this  paper, 
we  present  a  method  for  automatically  extracting  models 
from  low  level  software  implementations.  Our  method  is 
based  on  the  use  of  an  extensible  compiler  system,  xg++, 
to  perform  the  extraction.  The  extracted  model  is  combined 
with  a  model  of  the  hardware,  a  description  of  correctness, 
and  an  initial  state.  The  whole  model  is  then  checked  with 
the  Mur^p  model  checker.  As  a  case  study,  we  apply  our 
method  to  the  cache  coherence  protocols  of  the  Stanford 
FLASH  multiprocessor.  Our  system  has  a  number  of  advan¬ 
tages.  First,  it  reduces  the  cost  of  creating  models,  which 
allows  model  checking  to  be  used  more  frequently.  Second, 
it  increases  the  effectiveness  of  model  checking  since  the  au¬ 
tomatically  extracted  models  are  more  accurate  and  faith¬ 
ful  to  the  underlying  implementation.  We  found  a  total  of  8 
errors  using  our  system.  Two  errors  were  global  resource 
errors,  which  would  be  difficult  to  find  through  any  other 
means.  We  feel  the  approach  is  applicable  to  other  low  level 
systems. 


1  Introduction 

Our  ability  to  design  and  manufacture  increasingly  com¬ 
plex  systems  is  quickly  outstripping  our  ability  to  verify 
those  systems.  The  traditional  method  of  verification  is  test¬ 
ing  through  trials.  However,  it  becomes  exponentially  more 
diflicult  to  fully  exercise  a  system  through  testing  as  the 
number  of  control  paths  and  corner  cases  increases.  The 
result  is  increased  system  cost  and  decreased  system  relia¬ 
bility. 

Formal  verification  methods  are  an  attempt  to  solve  this 
problem  [17,  18,  21].  One  option  is  model  checking,  which 
is  the  systematic  and  exhaustive  exploration  of  the  system 


state  space.  The  computational  complexity  of  model  check¬ 
ing  makes  it  impractical  for  full  system  models,  so  it  is  com¬ 
mon  to  abstract  system  behavior  (which  means  to  suppress 
implementation  details)  or  to  scale  the  system  down  (which 
means  to  model  a  small  instance  of  the  system,  say,  three 
processors  instead  of  64).  Performing  one  or  both  of  these 
usually  covers  a  greater  range  of  behavior  than  conventional 
testing,  and  so  uncovers  bugs  that  testing  does  not.  It  is  im¬ 
portant  to  note  that  when  used  in  this  way,  model  checking 
abandons  the  traditional  goal  of  formal  verification,  which 
is  proving  the  “correctness”  of  a  system,  in  favor  of  the 
more  pragmatic  goal  of  discovering  bugs. 

The  difficulty  of  abstracting  the  design,  a  process  that 
involves  a  great  deal  of  manual  effort,  hampers  the  use  of 
model  checking  in  actual  system  design.  Moreover,  human 
errors  in  the  manual  abstraction  result  in  missing  bugs  and 
causing  false  alarms  during  the  verification  process,  further 
increasing  the  cost  and  reducing  the  usefulness  of  model 
checking.  Such  errors  can  be  introduced  both  when  con¬ 
structing  the  model  and  as  a  result  of  “drift”  as  the  actual 
system  evolves  [8]. 

This  paper  focuses  on  making  model  checking  practi¬ 
cal  by  developing  techniques  to  automatically  extract  model 
descriptions  from  code.  As  a  case  study,  we  apply  our  meth¬ 
ods  to  the  cache  coherence  protocols  used  on  the  Stanford 
FLASH  multiprocessor  [15].  A  FLASH  protocol  imple¬ 
mentation  consists  of  a  collection  of  event-driven  software 
handlers  that  are  dispatched  according  to  the  requests  that 
arrive  on  the  various  interfaces.  These  handlers,  which  run 
on  the  node  controller,  send  messages  on  the  I/O,  processor, 
and  network  interfaces  to  maintain  a  directory  of  cache  line 
states  and  service  cache  line  requests. 

Conventional  simulation-based  verification  of  FLASH 
has  found  many  protocol  bugs.  Nevertheless,  no  protocol 
has  booted  perfectly  on  the  hardware  on  the  first  try  [7].  Us¬ 
ing  simulation  to  verify  the  protocols  has  been  inadequate 
because  of  the  limited  and  fixed  detail  level  of  the  simulator 
and  the  high  cost  of  simulating  a  large  number  of  paths. 

Though  our  approach  could  Have  been  applied  to  a  wide 
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range  of  systems,  FLASH  protocol  code  has  a  number  of 
features  that  make  it  a  good  case  study  of  our  approach. 
First,  the  catastrophic  nature  of  coherence  code  bugs  has  al¬ 
ready  led  others  to  use  manually  driven  model  checking  to 
check  FLASH  protocols  [19].  Thus,  we  can  compare  our 
method  with  a  more  conventional  verification  technique. 
Second,  FLASH  is  representative  of  low  level  code  that  ex¬ 
ists  on  a  variety  of  embedded  systems.  It  is  highly  opti¬ 
mized  and  difficult  to  read,  and  thus  difficult  to  specify  cor¬ 
rectly.  Finally,  for  the  purpose  of  finding  errors,  FLASH 
represents  a  hard  test:  it  is  real,  working,  systems  code  that 
has  undergone  years  of  testing  under  simulation,  on  a  real 
machine,  and  via  formal  verification.  The  main  protocol  we 
check,  dyn-ptr,  has  been  under  constant  use  for  over  five 
years  and  has  formed  the  basis  for  almost  all  experimental 
results  on  the  hardware  [  13] . 

The  critical  enabling  technology  for  our  approach  is  an 
extensible  compiler,  xg-\-+  [7,  10].  xg++  allows  users  to 
easily  write  domain-specific  analysis  extensions  using  a  lan¬ 
guage  called  metaL  There  are  two  types  of  extensions:  ex¬ 
tensions  that  perform  extraction,  and  extensions  that  per¬ 
form  translation.  Extraction  extensions  select  sections  of 
protocol  code  to  be  modeled,  while  printing  extensions 
translate  the  extracted  protocol  code  into  a  Munp  model  de¬ 
scription.  uses  program  slicing  to  extract  the  selected 
sections  of  the  implementation,  while  the  translation  is  per¬ 
formed  on  the  sliced-out  abstract  syntax  tree  (AST)  [23]. 
Because  the  extraction  is  flexible,  the  author  of  the  exten¬ 
sions  can  use  human  judgment  to  abstract  away  implemen¬ 
tation  details  in  order  to  focus  on  the  important  aspects  and 
exploit  all  of  the  programming  conventions  used  in  the  pro¬ 
tocol  code  to  do  the  best  possible  extraction.  The  use  of 
jcg++  for  this  application  makes  it  feasible  to  write  several 
customized  translators  to  produce  different  models  of  the 
same  underlying  system,  each  focusing  on  different  func¬ 
tionality.  Each  extracted  model  is  then  combined  with  a 
manually  constructed  model  of  the  rest  of  the  system,  a  cor¬ 
rectness  definition,  and  an  initial  state  to  form  a  complete 
model,  which  is  verified  using  the  Munp  model  checker. 

Our  main  results  are: 

1.  The  approach  is  effective.  We  found  eight  hard  errors 
in  the  code.  All  of  these  could  have  crashed  the  sys¬ 
tem.  Two  are  errors  that  only  occur  on  very  specific 
sequences  of  events,  which  would  make  them  difficult 
to  find  through  testing. 

2.  The  approach  is  practical.  Our  extraction  and  trans¬ 
lation  extensions  are  about  100  lines  of  code,  which 
extract  descriptions  that  are  approximately  1000  lines 
from  implementations  that  are  about  lOK  lines.  We 
did  not  have  to  make  any  modifications  to  the  FLASH 
source,  except  to  preprocessing  macros. 

3.  The  approach  is  more  effective  than  manual  verifica¬ 


tion  -  it  found  more  bugs  (the  manual  effort  found 
none)  and  is  significantly  easier.  Its  increased  ef¬ 
fectiveness  is  largely  due  to  the  automatic  extrac¬ 
tion,  which  is  more  faithful  to  the  implementation  and 
checks  many  more  features  than  the  manually  con¬ 
structed  model. 

We  are  not  claiming  that  these  techniques  are  fully  auto¬ 
matic.  Rather,  they  automatically  extract  models  from  parts 
of  the  system  whose  implementations  are  understandable  by 
jcg++.  For  example,  the  FLASH  network  had  to  be  manu¬ 
ally  modeled  because  it  did  not  have  an  implementation  that 
could  be  automatically  processed. 

In  this  paper,  we  will  explain  our  methodology  and  show 
how  it  was  applied  to  the  FLASH  cache  coherence  proto¬ 
cols.  We  begin  with  a  high  level  overview  of  how  the  sys¬ 
tem  works  in  Section  2.  The  steps  that  require  manual  inter¬ 
vention  are  then  detailed  in  Sections  3,  4,  and  5.  Section  6 
presents  the  results  of  our  verification  of  the  FLASH  proto¬ 
cols.  We  follow  this  by  examining  the  accuracy  of  a  manu¬ 
ally  constructed  model  of  a  FLASH  protocol  in  Section  7.  A 
comparison  of  our  method  to  other  similar  methods  is  given 
in  Section  8.  Finally,  we  conclude  the  paper  in  Section  9. 

2  Overview  of  the  Extraction  Method 

In  this  section,  we  explain  at  a  high  level  how  our  sys¬ 
tem  works  and  then  give  an  example  of  how  an  extracted 
model  compares  with  a  manually  built  model,  as  well  as 
with  the  corresponding  implementation  code.  Figure  1  il¬ 
lustrates  the  process  of  extracting  and  verifying  models  of 
the  FLASH  protocols.  In  our  system,  clients  use  the 
extension  language,  metal,  to  write  the  metal  sheer  exten¬ 
sion,  which  specifies  the  state  variables  and  subroutines  that 
should  be  extracted  into  the  model.  The  user  also  specifies 
rules  in  the  metal  printer  extension  for  translating  the  sliced 
actions  into  a  Munp  model  description.  The  xg++  compiler 
then  takes  these  two  metal  extensions  along  with  the  origi¬ 
nal  implementation  code  and  produces  a  Munp  model  of  the 
protocol. 

Munp  is  a  model  checker  that  uses  explicit  state  enu¬ 
meration  with  a  Pascal-like  language  for  specifying  mod¬ 
els.  Model  checkers  perform  verification  by  exhaustively 
searching  the  reachable  states  of  a  system  for  violations 
of  user-specified  invariants.  In  any  given  state,  Mur(p  will 
“nondeterministically”  execute  all  possible  outcomes.  Each 
outcome  is  a  new  state,  which  is  checked  for  correctness  and 
then  inserted  into  a  table  that  contains  visited  states.  If  the 
state  has  been  visited  earlier,  it  is  pruned  and  its  successors 
are  not  visited  again. 

Before  the  Murp  model  checker  can  be  applied,  the  pro¬ 
tocol  model  must  be  combined  with  a  model  of  the  hard¬ 
ware  on  which  the  protocol  runs.  Unfortunately,  there  is 


Hgure  1.  Row  chart  of  model  extraction  and  verification 


no  easy  way  to  automatically  create  this  model  since  the 
hardware  model  must  describe  everything  from  the  behav¬ 
ior  of  the  processor  interconnect  to  the  functional  units  on 
the  node  controller.  As  a  result,  the  user  must  still  write 
the  hardware  model  manually.  The  user  must  also  spec¬ 
ify  a  definition  of  correctness  in  the  form  of  invariants  and 
assertions,  as  well  as  a  starting  state.  In  the  case  of  the 
FLASH  protocols,  these  had  to  be  manually  specified  since 
none  could  be  extracted  from  the  implementation. 

At  a  high  level,  the  user  performs  the  following  five 
steps: 

1.  Define  the  protocol  state  to  be  modeled.  This  is  es¬ 
sentially  a  list  of  variables  and  functions  relevant  to 
the  properties  to  be  checked  and  comprises  the  metal 
slicer  extension. 

2.  Add  routines  that  insert  or  rewrite  code.  This  step  may 
add  correctness  checks  or  abstract  away  detail  for  the 
model.  This  comprises  the  metal  printer  extension. 

3.  Create  a  model  of  the  hardware,  correctness  properties, 
and  initial  state.  This  process  is  entirely  manual. 

4.  Combine  the  extracted  model  with  the  manually  speci¬ 
fied  components  to  create  a  complete  model  that  Munp 
can  check.  This  can  be  automated  -  in  our  case,  a  set 
of  scripts  performed  this  function. 

5.  Check  the  model  with  Munp.  The  model  checker  pro¬ 
vides  an  error  trace  if  any  correctness  properties  are 
violated. 

However,  these  steps  are  not  as  difficult  as  they  might 
appear.  The  first  three  steps  are  only  necessary  when  the 
model  is  first  defined,  if  there  is  a  significant  reimplementa¬ 
tion,  or  if  the  scope  of  the  verification  effort  changes.  Since 


the  metal  extensions  are  applied  to  all  handlers,  they  are  in¬ 
dependent  of  the  number  of  handlers  in  the  protocol  code  or 
the  length  of  the  code.  The  first  two  steps  can  be  used  to  de¬ 
fine  several  different  protocol  models,  but  these  models  will 
usually  have  nearly  identical  hardware  models,  correctness 
properties  and  starting  states.  Furthermore,  though  the  last 
two  steps  might  need  to  be  performed  more  frequently  than 
the  first  three,  they  are  almost  completely  automated  in  our 
system,  thus  minimizing  the  incremental  cost  of  keeping  the 
model  up  to  date  if  the  underlying  implementation  changes. 

Now,  let  us  examine  how  this  extraction  method  works 
on  an  actual  segment  of  FLASH  protocol  code  and  how  the 
extracted  model  compares  with  a  manually  specified  model. 
Figure  3  shows  a  manually  specified  model  of  the  FLASH 
protocol  code  in  Figure  2,  both  with  line  number  annota¬ 
tions  that  illustrate  the  correspondence  between  the  FLASH 
implementation  and  model  description.  What  the  segment 
code  actually  does  is  not  important  for  this  example.  Rather, 
the  reader  should  notice  that  the  paths  of  execution  are  de¬ 
pendent  on  the  hi  structure,  which  is  the  directory  state.  In 
addition,  there  are  various  SEND  commands  that  cause  mes¬ 
sages  to  be  sent  on  the  processor  (PI)  or  network  (NI)  inter¬ 
faces.  Finally,  the  reader  should  note  that  there  are  debug¬ 
ging  assertions  in  the  code  that  function  just  as  assertions 
do  in  any  C  code.  A  Munp  model  description  consists  of  a 
series  of  rules.  The  rule  bodies  are  executed  when  the  rule 
precondition  (the  part  before  the  ==>)  is  true.  The  structure 
of  the  model  differs  from  the  code  because  the  author  of  the 
model  chose  to  use  two  separate  rules  with  different  precon¬ 
ditions  that  are  explicit  “if”  statements  in  the  code.  Despite 
this  superficial  difference,  there  is  a  clear  mapping  between 
statements  in  the  Munp  description  and  the  FLASH  imple¬ 
mentation. 

The  core  observation  motivating  our  work  is  that  the  cor- 
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void  PILocalGet (void)  { 

/*  ...  Boilerplate  setup  code  ...  */ 
headLinkAddr  = 

FAST_ADDRESS„TO_HEADLINKADDR ( addr ) ; 
FLDEBUG('h',  ’'%u:  headLinkAddr  =  %llx", 
procNum,  headLinkAddr) ; 

READ_HEADLINK (headLinkAddr) ; 
nh.len  =  LEN_C ACHELINE ; 

2,10  if  ( !hl .Pending)  { 

11  if  (!hl. Dirty)  { 

/*  ...  37  lines  deleted  ...  */ 

ASSERT ( ihl.IO) ; 

//  The  coiranented  out  ASSERT  is 
//  true  99.99%  of  the  time,  but  is 
//  not  always 

12!  //  ASSERT (hi. Local ) ; 

/*...  deleted  15  lines  ...  */ 

14  PI_SEND(F_DATA,  F_FREE,  F_SWAP, 

F_NOWAIT,  F_DEC,  1); 

13  hi. Local  =  1; 

/*  ...  deleted  14  lines  */ 

3  }  else  { 

5  ASSERT (I  hi. List) ; 

5  ASSERT ( ! hi . RealPtrs ) ; 

FLSTAT_INC (procNum,  readsCancelled) ; 

if  (Ihl.IO)  { 

5  ASSERT (hi .HeadPtr) ; 

4  ASSERT ( ! hi . Local ) ; 
nh.len  =  LEN_NODATA; 

/*  setting  opcode  for  send  */ 

8  nh.msgType  =  MSG_GET; 

/*  setting  destination  to 

node  that  called  us  */ 

8  nh.dest  =  hl.Ptr; 

8  NI_SEND (THIRD,  F_NODATA,  F_FREE, 

F_NOSWAP,  F_NOWAIT,  12); 
/*  ...  deleted  12  lines  ...  */ 

6  hi . Pending  =  1 ; 

_ }  _ 

Figure  Z  Code  associated  with  model  de¬ 

scription  in  Rgure  3. 


Rule  "PI  Local  Get  (Else)" 

Cache. State  =  Invalid  &  !  Cache. Wait 
&  !  DH. Pending  —  if  pending  NAK 
&  DH. Dirty  ==> 

Begin 

4:  Assert  !DH. Local  " PILocalGet :L  =  AO"; 

5:  Assert  DH.Head  &  IDH.List  &  DH.Real=0 

"PILocalGet: case  D=l"; 

6:  DH. Pending  :=  true; 

7:  Cache. Wait  :=  true; 

8:  Send_Request (Home,  DH.HPtr,  Get, 

Home,  void) ; 

End; 

Rule  "PI  Local  Get  (Put)" 

9:  Cache. State  =  Invalid  &  !  Cache. Wait 

10:  &  !  DH. Pending  --  if  pending  NAK 

11:  &  I  DH. Dirty  ==> 

Begin 

12:  Assert  IDH. Local  " PILocalGet : L  =  AO" 

13:  DH. Local  :=  true; 

14:  CC_Put(Home,  Memory); 

EndRu 1 ej _ 

Hgure  3.  Partial  Munp  model  description  for 
the  PILocalGet  handler  in  Figure  Z 


Rgure  4.  An  automatically  detracted  model  of 
the  FLASH  code  in  Rgure  Z 


respondence  between  the  model  and  the  implementation  is 
so  strong  that  it  should  be  possible  to  automatically  build 
the  model  description  from  the  code.  We  see  this  in  Fig¬ 
ure  4,  which  shows  an  automatically  extracted  model  of 
the  code  in  Figure  2  that  was  derived  by  our  system.  The 
metal  slicer  used  to  generate  this  description  specifies  that 
the  hi  and  nh  variables,  the  SEND  functions,  and  the  asser¬ 
tions  should  be  extracted.  The  extracted  model  mirrors  the 
code  more  closely  than  the  manually  constructed  model  and 
is  richer  in  its  description.  Specifically,  the  header  length 
assignments  and  assertions  present  in  the  code,  which  are 
marked  with  asterisks  in  the  figure,  are  included.  Automatic 
extraction  makes  it  easy  to  model  such  additional  features. 

There  are  some  differences  between  the  manually  con¬ 
structed  model  and  the  extracted  model.  An  example  of  this 
is  line  7  in  Figure  3,  which  does  not  appear  in  the  proto¬ 
col  code  because  the  hardware  sets  the  cache  state.  When 
manually  modeling  a  system,  the  user  is  free  to  mix  ac¬ 
tions  of  both  the  code  and  the  hardware  in  the  model.  Our 
extracted  model  does  not  include  hardware  actions,  which 
must  be  modeled  manually  instead.  Another  good  example 
of  both  of  these  problems  is  line  12  in  Figure  3.  It  is  an 
assertion  that  has  been  removed  from  the  implementation, 
but  remains  in  the  Munp  description.  On  the  other  hand, 
it  did  not  cause  any  false  positives  because  of  translation 
mistakes  elsewhere  in  the  model.  The  problem  of  drift  and 
translation  mistakes  between  manually  written  models  and 
the  underlying  implementations  will  be  given  more  detailed 
treatment  in  Section  7. 

Automatic  extraction  has  two  important  benefits.  First, 
the  time  required  to  create  a  model  is  reduced,  and  thus  the 
user  can  specify  a  large  number  of  small  models  that  check 
orthogonal  aspects  of  the  same  implementation.  These 
small  models  make  model  checking  computationally  fea¬ 
sible,  but  do  not  sacrifice  model  detail.  The  other  benefit 
is  that  the  automatic  extraction  ensures  that  the  extracted 
model  is  faithful  to  the  original  implementation. 

3  The  Metal  Slicer 

We  now  discuss  how  one  uses  xg++  to  extract  a  proto¬ 
col  model.  xg++  breaks  the  extraction  down  into  two  tasks. 
First,  it  uses  the  metal  slicer  to  remove  lines  of  code  that  do 
not  affect  the  protocol  state  the  user  is  interested  in  mod¬ 
eling,  thus  slicing  the  implementation  down  into  a  simpler 
model.  Second,  it  translates  the  actions  in  the  protocol  im¬ 
plementation  into  abstracted  actions  in  the  model  with  the 
metal  printer.  We  examine  the  metal  slicer  facility  in  this 
section  and  leave  the  metal  printer  for  the  next  section. 

The  metal  slicer  allows  the  user  to  match  arbitrary  ex¬ 
pression  patterns  in  the  implementation  code  to  select  a 
slice.  Figure  5  gives  a  partial  example  of  a  metal  slicer  that 
extracts  actions  needed  to  check  that  the  protocol  code  sets 


the  length  field  of  a  packet  header  correctly.  Each  pattern 
declaration  (pat)  selects  a  portion  of  the  FLASH  code  that 
will  be  extracted.  For  example,  pat  length  indicates 
that  the  length  field  in  the  message  header  is  to  be  extracted 
as  part  of  the  model  description.  Message  sends  and  some 
directory  values  are  also  included:  the  former  since  the  pro¬ 
tocol  must  ensure  that  messages  have  their  header  length 
fields  set  correctly  before  sending,  the  latter  so  that  paths 
dependent  on  the  directory  state  are  executed.  In  total,  the 
metal  extension  of  the  dyn-ptr  protocol  is  very  compact, 
encompassing  approximately  40  lines  without  comments. 

Using  a  slicing  algorithm  [22,  23]  automatically  derives 
all  code  that  affects  the  parts  selected  by  the  metal  exten¬ 
sions.  Our  Jcg++  based  slicer  computes  a  backward  slice  at 
the  level  of  statements  with  an  algorithm  based  on  the  pro¬ 
gram  dependence  graph  (PDG)  [12].  The  nodes  of  a  PDG 
represent  program  statements  and  the  arcs  represent  the 
control  and  data  dependencies  between  statements.  Control 
dependencies  occur  when  a  statement  can  affect  whether  an¬ 
other  statement  is  executed.  For  example,  the  true  and  false 
branches  of  an  “if”  statement  are  control  dependent  upon 
its  condition.  Data  dependencies,  on  the  other  hand,  link 
definitions  of  variables  to  their  uses.  Intuitively,  these  de¬ 
pendencies  capture  the  flow  of  information  from  value  pro¬ 
ducers  or  mutators  to  their  consumers.  Data  dependencies 
can  be  calculated  using  the  well-known  “reaching  defini¬ 
tions”  data  flow  algorithm  [1],  The  slicing  algorithm  itself 
is  implemented  as  a  simple  graph  traversal  of  the  PDG. 

Standard  slicing  techniques  have  difficulty  producing  ac¬ 
curate  slices  in  the  presence  of  common  C  constructs  such 
as  pointers,  unions,  and  unstructured  control  flow.  How¬ 
ever,  since  FLASH  protocol  code  shares  features  common 
with  low  level  systems  code,  it  those  troublesome  features 
of  C  in  very  limited  ways.  As  our  slicer  is  configurable,  we 
can  have  it  automatically  abstract  those  features  by  rewrit¬ 
ing  those  sections,  as  will  be  demonstrated  in  Section  4. 

By  having  an  extensible  compiler  such  as  xg++  the  user 
need  not  understand  the  details  of  manipulating  the  com¬ 
piler’s  internal  data  structures,  nor  does  the  user  need  to  un¬ 
derstand  the  implementation  of  program  slicing  algorithms. 

4  The  Metal  Printer 

Translation  of  the  sliced  code  is  accomplished  by  the 
metal  printer  extension,  which  allows  the  user  to  arbitrar¬ 
ily  insert  or  rewrite  actions  in  the  model  description.  This 
facility  allows  the  user  to  exploit  domain-specific  knowl¬ 
edge  to  create  an  optimal  extraction.  This  capability  is  im¬ 
plemented  by  matching  user- specified  patterns  against  the 
abstract  syntax  tree  as  the  slice  is  emitted.  A  pattern  is 
enclosed  by  the  first  set  of  braces  before  the  ==>.  If  the 
pattern  matches,  then  the  default  output  is  suppressed  and 
the  pattern’s  action  (after  the  ==>  token)  is  executed  to 


sm  len  slicer  { 

/*  wildcard  variables  for  pattern  matching  */ 
decl  {  scalar  }  type,  data,  keep,  swp,  wait,  nl; 

/*  Pattern  that  will  match  all  uses  of  the  length  field.  */ 
pat  length  =  {  nh.len  } ; 

/*  Patterns  to  match  network  and  processor  message 
sends,  which  use  the  length  field.  */ 
pat  sends  = 

{  NI_S END ( type ,  data,  keep,  swp,  wait,  nl)  } 

I  {  PI_SEND ( type ,  data,  keep,  swp,  wait,  nl)  } 

/*  Patterns  to  match  accesses  to  directory  entries  */ 
pat  entries  =  {  hi. Local  }  |  {  hi. Dirty  }  |  {  hi. List  }; 

/*  Mark  all  matched  patterns:  the  slicer  will 

extract  these  and  all  code  that  influence  them.  */ 
all:  length  |  sends  |  entries  ==>  {  mgk_tag (mgk_s ) ;  } 


Rgure  5.  A  metal  slicer  extension  used  to  extract  a  model  for  verification  of  length  field  handling. 


output  user- specified  code.  The  special  emitter  function 
mgk_e  takes  a  print f -like  format  string  augmented  with 
%t,  which  allows  matched  pattern  subtrees  to  be  output 
as  strings.  One  use  of  this  facility  is  to  include  additional 
code  that  checks  for  correctness  properties.  This  can  help 
users  tighten  the  verification  of  their  models.  Figure  6  gives 
an  example  of  a  protocol- specific  printer  that  automatically 
inserts  assertions  before  each  NI^SEND  to  check  that  the 
length  field  in  a  network  packet  is  correctly  set  before  the 
packet  is  sent. 

In  addition  to  strengthening  the  correctness  properties, 
the  metal  printer  facility  can  be  used  to  abstract  away  imple¬ 
mentation  details  by  taking  advantage  of  FLASH  domain- 
specific  knowledge.  There  are  three  main  areas  in  the 
FLASH  verification  where  this  is  done:  the  emulation  of 
bit  operations  in  Munp,  reconstructing  implicit  types  from 
C  unions,  and  abstracting  C  data  structures  to  reduce  the 
state  space. 

Munp  is  a  more  minimal  language  than  C,  and  as  such, 
does  not  provide  some  of  the  facilities  that  C  does.  Some 
examples  of  this  are  the  bit  operations  that  are  often  found 
in  embedded  systems  code  such  as  the  FLASH  protocols. 
These  operations  must  be  emulated  by  the  Munp  model  to 
allow  for  proper  protocol  modeling.  We  use  our  config¬ 
urable  printer  to  match  uses  of  unsupported  operations  and 
rewrite  them  to  call  subroutines  in  the  hardware  model  that 
emulate  those  actions. 

Another  complication  results  from  the  loose  typing  of  C. 
In  the  bitvector  protocol,  the  Vector  variable  repre¬ 
sents  a  single  node  ID  when  there  is  only  one  sharer,  but 


holds  a  bitvector  of  sharers  when  there  is  more  than  one. 
The  bitvector  protocol  keeps  the  number  of  sharers  in 
a  separate  variable.  Murp  does  not  have  enough  type  in¬ 
formation  to  interpret  these  values  since  the  union  is  im¬ 
plicit.  However,  we  may  leverage  the  compiler  to 
infer  the  type  and  rewrite  the  extracted  output.  Thus,  in  the 
extracted  model,  two  variables  replace  the  single  Vector 
variable,  one  of  which  is  a  node  ID,  and  the  other,  a  list  of 
nodes.  Each  access  to  Vector  is  replaced  by  an  access  to 
the  appropriate  variable  while  each  modification  becomes 
two  modifications,  one  to  each  of  the  extracted  variables. 

Finally,  to  make  model  checking  tractable,  measures 
must  be  taken  to  limit  the  number  of  states.  When  manu¬ 
ally  constructing  a  model,  the  user  will  abstract  data  types 
when  it  is  safe  to  do  so.  The  same  can  be  done  with  the  au¬ 
tomatic  extraction  if  the  compiler  can  be  made  to  recognize 
instances  where  such  abstractions  can  be  made.  For  exam¬ 
ple,  the  dyn-ptr  protocol  uses  a  linked  list  to  keep  track 
of  the  sharers  on  a  cache  line.  Implementing  a  literal  linked 
list  produces  artificial  state  space  explosion  and  complicates 
the  model  description  since  Munp  has  no  concept  of  point¬ 
ers.  Abstracting  the  linked  list  to  an  array  makes  the  model 
much  simpler  and  more  efficient.  The  dyn-ptr  protocol 
code  manipulates  the  linked  list  through  a  set  of  functions, 
so  every  access  is  explicit.  The  linked  list  in  the  imple¬ 
mentation  is  thus  transformed  into  an  array  in  the  extracted 
model  by  configuring  the  metal  printer  to  rewrite  calls  to 
the  linked  list  accessing  functions. 

Interestingly,  the  extraction  itself  does  not  contribute  di¬ 
rectly  to  state  explosion  in  the  model  checker.  The  size 


sm  printer  tagged_printer  { 

decl  {  scalar  }  data,  keep,  swap,  wait,  dec,  null,  type; 
all: 

/*  Automatically  insert  length  assertions  before  each  send.  */ 
{  NI_SEND ( type ,  data,  keep,  swap,  wait,  null);  }  ==> 

{ 

if  (mgk_int_cst (data)  !=  0) 

ingk_e  { "assert  (nh.  len  =  len_data)  ; " )  ; 

else 

nigk_e  ( "  assert  (nh .  len  =  len_nodata)  ; " )  ; 
ingk_e( "ni_send(%t,  %t,  procNum,  nh) ; " ,  type,  swap) ; 

} 

/*  rewrite  ' len_cacheline '  and  'len_word'  as  'len_data'  */ 

I  {  len_cacheline  }  |  {  len_word  }  ==>  {  mgk_e ( " len_data" ) ;  } 


Rgure  6.  A  metal  printer  extension  used  to  Insert  length  field  assertions. 


of  the  state  space  and  encoding  is  determined  by  the  way 
the  user  chooses  to  specify  the  data  structures  in  the  model. 
Since  the  actions  being  extracted  are  executed  atomically, 
redundancies  and  extra  local  variables  in  the  extraction  do 
not  add  extra  states.  They  may  result  in  additional  com¬ 
putation  time,  but  for  large  models  this  is  usually  not  the 
limiting  factor. 

The  metal  printer  is  a  good  example  of  the  benefit  of  hav¬ 
ing  an  extension-based  system  rather  than  an  annotation- 
based  one,  since  the  annotation  is  effectively  automated  by 
the  rules  set  in  the  printer.  This  alleviates  the  need  for  the 
user  to  manually  place  annotations  throughout  the  code. 

5  The  Hardware  Model,  Correctness  Defini¬ 
tion,  and  Starting  State 

Before  the  model  checker  can  be  applied,  it  must  be  com¬ 
bined  with  a  model  of  the  hardware  on  which  the  protocol 
runs.  In  addition,  a  definition  of  correctness  in  the  form 
of  invariants  and  assertions  must  be  specified,  as  well  as  a 
starting  state  for  the  model.  The  user  verifying  the  system 
must  manually  create  these  components.  Fortunately,  these 
coniponents  usually  do  not  change  much  in  the  course  of 
system  development. 

In  manually  modeled  systems,  actions  performed  by  the 
hardware  and  protocol  can  be  interleaved.  Because  part  of 
the  modeling  is  done  automatically  in  our  system,  this  is  no 
longer  true.  Rather,  the  hardware  must  be  described  sepa¬ 
rately  so  that  it  accurately  models  the  interaction  between 
hardware  and  the  extracted  model  description  of  the  pro¬ 
tocol.  There  are  two  types  of  interaction  that  concern  us. 
First,  the  protocol  code  can  make  calls  to  hardware  func¬ 
tional  units.  Examples  of  this  are  the  SEND  instructions, 
which  in  reality  are  assembler  instructions  that  cause  the 
FLASH  node  controller  to  transmit  messages.  The  other 


type  of  interaction  is  where  the  hardware  causes  certain 
parts  of  the  protocol  code  to  execute.  For  example,  when 
the  FLASH  node  controller  receives  a  request,  it  consults 
a  table  that  causes  it  to  execute  a  specified  piece  of  code 
called  a  handler. 

On  FLASH,  the  protocol  code  activates  hardware  units  to 
perform  functions.  An  example  of  this  is  the  node  controller 
logic  that  sends  protocol  messages  out  on  the  various  I/O 
subsystem,  processor,  and  network  interfaces.  The  SEND 
instructions  in  the  protocol  code  normally  map  to  assembler 
commands,  which  are  decoded  and  executed,  eventually  ac¬ 
tivating  the  interface  logic.  The  hardware  model  maps  these 
instructions  to  a  subroutine  that  manipulates  the  model  net¬ 
work  and  node  controller  data  structures  in  the  appropri¬ 
ate  way  to  mimic  this  behavior.  Another  example  is  the 
“software  queue”,  which  is  provided  by  the  FLASH  node 
controller  hardware,  where  protocol  handlers  can  suspend 
themselves  in  instances  when  there  are  not  enough  physical 
resources  for  them  to  complete  their  tasks,  to  be  reactivated 
at  a  later  time.  Similarly,  the  hardware  model  has  subrou¬ 
tines  that  act  on  data  structures  that  mimic  this  queue.  The 
instructions  that  the  protocol  code  uses  to  activate  the  soft¬ 
ware  queue  are  mapped  onto  these  subroutines  that  we  have 
provided. 

Naturally,  the  hardware  reactivates  the  suspended  han¬ 
dlers  at  a  later  time.  Thus,  we  arrive  at  the  other  form  of 
interaction,  where  the  hardware  causes  certain  parts  of  the 
protocol  code  to  run.  The  FLASH  node  controller  has  four 
input  queues  that  can  cause  handlers  to  execute.  One  of 
these  is  the  software  queue  where  handlers  are  suspended. 
The  others  are  input  queues  from  the  I/O,  processor,  and 
network  interfaces.  Before  suspending  themselves  on  the 
software  queue,  handlers  store  a  continuation  PC  to  a  field 
in  the  software  queue  entries.  If  there  is  a  valid  entry  present 
on  the  software  queue,  the  node  controller  can  select  this 


Invariants 

Dynptr 

Bitv 

RAC 

Coma 

The  RealPtrs  counter  does  not  overflow  (RealPtrs  maintains  the  number  of 

X 

X 

X 

X 

sharers) 

Only  a  single  master  copy  of  each  cache  line  exists  (basic  coherence) 

X 

X 

X 

X 

A  node  can  never  put  itself  on  the  sharing  list  (sharing  list  is  only  for  remote 

X 

X 

X 

X 

nodes) 

No  outstanding  requests  on  cache  lines  that  are  already  in  Exclusive  state 

X 

X 

X 

X 

Nodes  do  not  send  network  messages  to  themselves 

X 

X 

X 

X 

Nodes  never  overflow  their  network  queues 

X 

X 

X 

X 

Nodes  never  overflow  their  software  queues  (queue  used  to  suspend  handlers) 

X 

X 

X 

X 

The  protocol  never  tries  to  invalidate  an  exclusive  line 

X 

X 

X 

X 

Protocol  can  only  put  data  into  the  processor’s  cache  in  response  to  a  request 

X 

X 

X 

X 

All  processor  message  header  opcode  fields  are  set  to  valid  opcodes 

X 

X 

X 

X 

Opcode  XOR  operations  always  occur  on  known  opcodes  (invalid  opcodes  are 

X 

X 

X 

X 

never  created) 

If  there  is  no  sharer  in  the  HeadPtr,  the  sharing  list  is  empty 

X 

X 

X 

If  the  sharing  list  is  not  empty,  RealPtrs,  the  number  of  sharers  is  greater  than 

X 

X 

X 

zero 

The  protocol  state  is  pending  while  waiting  for  invalidations 

When  a  line  is  dirty,  the  sharing  list  is  empty  (this  is  only  true  for  if  there  are  no 

X 

X 

X 

X 

handler  suspensions) 

Table  1.  Description  of  all  invariants  checked. 


suspended  handler  to  be  serviced  by  jumping  to  the  contin¬ 
uation  PC.  An  enumerated  variable  whose  values  represent 
all  the  possible  entry  points  that  the  continuation  PC’s  can 
take  together  with  a  dispatch  function  that  mimics  the  hard¬ 
ware  jump  mechanism  models  this  behavior.  The  dispatch 
mechanism  for  the  other  three  queues  is  similar.  Each  mes¬ 
sage  that  arrives  on  one  of  the  I/O,  processor,  or  network 
interfaces,  contains  an  opcode  that  indicates  the  message 
type.  A  JumpTable  configuration  file  that  indicates  which 
handler  is  executed  depending  on  the  type  of  message  that 
arrives,  is  used  to  program  the  hardware  dispatch.  These 
dispatch  conditions  can  be  easily  transformed  into  the  pre¬ 
conditions  that  guard  each  extracted  handler  rule.  In  fact, 
the  mapping  is  simple  enough  that  in  our  FLASH  verifica¬ 
tion,  this  process  was  automated  with  a  simple  script. 

In  addition  to  the  hardware  model,  a  correctness  defini¬ 
tion  must  be  provided.  Table  1  gives  a  list  of  invariants  that 
we  check.  Some  of  these  are  invariants  about  the  modeled 
hardware  components.  For  example,  a  node  cannot  send  a 
packet  to  itself  -  the  network  will  not  route  such  a  request 
properly.  On  the  other  hand,  some  invariants  are  model  spe¬ 
cific.  For  example,  in  the  dyn-ptr  protocol,  if  the  list 
of  sharers  is  non-empty,  then  the  head  pointer  cannot  be 
NULL.  The  bit  vector  protocol  does  not  use  a  linked  list 
so  this  invariant  cannot  be  applied  to  that  protocol.  In  addi¬ 
tion,  the  invariants  may  change  depending  on  what  aspects 
of  the  protocol  are  modeled.  The  ability  to  specify  protocol 


specific  invariants  allows  the  user  to  provide  very  specific 
correctness  conditions.  However,  as  we  see  here,  out  of  15 
invariants,  1 1  apply  to  all  cases.  Thus,  in  our  FLASH  verifi¬ 
cation,  the  invariants  are  largely  independent  of  the  protocol 
model. 

Finally,  a  starting  state  must  be  provided.  For  FLASH, 
this  is  the  state  of  the  machine  at  power-on,  meaning  that  all 
valid  memory  is  at  its  home  node  and  the  directory  entries 
are  all  blank. 

6  Results 

With  an  extracted  Mur^p  model,  we  found  a  total  of  eight 
bugs  in  two  of  the  four  FLASH  protocols  modeled.  We 
found  six  errors  in  dyn-ptr  (four  network  header  bugs, 
two  counter  overflows)  and  two  in  bit  vector.  In  con¬ 
trast,  the  manual  verification  of  dyn-ptr  found  no  bugs. 

The  results  of  the  verification  are  given  in  Table  2. 
The  size  of  the  manually  built  component,  which  includes 
the  hardware  model,  invariants,  and  starting  state,  changes 
slightly  between  protocols  because  of  the  different  invari¬ 
ants  and  needs  of  each  model.  Note  that  the  automatic  ex¬ 
traction  reduces  the  number  of  manually  written  lines  by  a 
factor  of  two  or  more.  What  is  even  more  significant  is  that 
since  our  method  faithfully  extracts  a  model,  the  user  need 
not  understand  every  detail  of  the  protocols  to  produce  one. 

The  automatically  inserted  assertions  described  in  Sec- 


Protocol 

(Max  Processors) 

Errors 

found 

Protocol  Size 
(lines) 

Extracted  Model 
(lines) 

Manual  Model 
(lines) 

Metal  Size 
(lines) 

Dyn-Ptr(n=4) 

6 

12K 

1100 

1000 

99 

Bitvector(n=4) 

2 

8K 

700 

1000 

100 

RAC(n=4) 

0 

lOK 

1500 

1200 

119 

Coma(n=4) 

0 

15K 

2800 

1400 

159 

Table  Z  The  results  of  verifying  four  protocois. 


tion  2  found  four  bugs  in  dyn-ptr.  To  improve  perfor¬ 
mance,  the  protocol  speculatively  sets  the  field  to  optimize 
for  the  common  case.  The  extractor  was  able  to  determine 
what  kind  of  message  the  protocol  was  sending  and  asser¬ 
tions  were  automatically  placed  before  each  send  operation 
to  ensure  that  the  data  length  field  was  set  correctly.  Be¬ 
cause  Mur<^  exhaustively  exercises  all  paths,  it  detected  the 
four  uncommon  cases  where  the  speculation  was  false,  but 
there  was  no  correction  code. 

After  fixing  the  preceding  bugs,  two  subtle  counter  over¬ 
flow  errors  were  found.  Both  errors  involved  miscalcula¬ 
tions  of  the  maximum  number  of  sharers  that  a  counter  had 
to  record.  They  are  particularly  malicious  in  that  they  only 
manifest  themselves  as  a  result  of  a  single  rare  interleaving 
of  events. 

The  first  bug  involves  a  performance  optimization,  limit 
search,  used  in  the  dyn-ptr  protocol.  The  problem  with 
using  a  linked  list,  as  dyn-ptr  does,  is  that  the  worst  case 
overhead  of  searching  for  a  single  sharer  becomes  linear 
with  the  number  of  sharers.  Such  a  search  occurs  when  a 
node  n  is  already  on  the  list  and  evicts  the  cache  line  due 
to  a  capacity  or  conflict  cache  miss.  As  a  result,  n  should 
no  longer  be  on  the  sharing  list  and  needs  to  be  removed. 
In  practice,  a  linked  list  traversal  on  every  cache  line  evic¬ 
tion  is  far  too  costly.  However,  the  sharer  must  be  removed 
from  the  list  or  repeated  evictions  and  requests  can  cause 
the  list  to  grow  without  bound.  The  limit  search  optimiza¬ 
tion  makes  the  cost  of  cache  line  eviction  independent  of 
list  length.  If  the  sharer  that  is  to  be  removed  is  not  found 
after  searching  a  fixed  number  of  list  entries  (the  limit),  a 
counter,  StalePtrs,  is  incremented  to  indicate  that  there 
is  a  “stale”  sharer  in  the  list.  When  StalePtrs  reaches  its 
maximum  value,  all  sharers  on  the  list  are  invalidated  to  re¬ 
move  the  duplicate  sharers.  A  second  counter  RealPtrs 
is  used  to  keep  track  of  the  list  size.  It  is  incremented  on  ev¬ 
ery  sharer  addition  and  decremented  on  every  sharer  dele¬ 
tion.  As  a  result,  RealPtrs  must  be  large  enough  to  hold 
the  maximum  number  of  sharers  on  a  list,  which  is  the  max¬ 
imum  value  of  StalePtrs  plus  the  number  of  nodes  on 
the  system* . 

Unfortunately,  due  to  the  reallocation  of  bits  in  the  struc¬ 
ture  used  to  hold  these  counters,  the  size  of  RealPtrs  was 

^Actually,  this  is  not  really  true  because  of  the  next  bug. 


7  bits  while  StalePtrs  was  10  bits,  causing  RealPtrs 
to  overflow  on  the  actual  machine.  The  model  checker 
detects  a  clear  sequence  of  events  that  leads  to  the  to  the 
counter  overflow. 

The  second  overflow  error  also  occurred  on  the 
RealPtrs  counter,  which  maintains  a  count  of  the  num¬ 
ber  of  sharers.  In  the  absence  of  limit  search  (maximum 
value  of  StalePtrs  is  zero),  the  maximum  value  of 
RealPtrs  was  thought  to  be  the  maximum  number  of 
physical  nodes  that  can  be  supported  on  a  system.  How¬ 
ever,  a  specific  interleaving  of  messages  can  result  in  a 
RealPtrs  count  of  one  greater  than  the  number  of  nodes, 
thus  breaking  the  rule.  Because  the  implementation  of  the 
protocol  has  space  allocated  to  RealPtrs  for  128  nodes 
regardless  of  the  number  of  nodes  on  the  system,  this  bug 
never  occurs,  even  after  extensive  use  of  the  machine,  be¬ 
cause  only  72  nodes  exist  and  thus  the  RealPtrs  limit 
is  never  tested.  However,  in  the  future  if  the  width  of 
RealPtrs  decreases  or  a  larger  machine  is  built,  this 
would  cause  failures. 

Finally,  there  were  two  errors  found  in  the  bitvector 
protocol.  We  found  one  case  where  a  message  was  sent  on 
the  wrong  network  lane.  Neither  the  simulator  nor  the  hard¬ 
ware  checks  that  the  messages  are  on  the  correct  lanes,  and 
there  is  no  manually  built  model  of  the  bitvector  proto¬ 
col  that  would  have  caught  this  error.  A  false  assertion  was 
also  discovered  in  the  bitvector  protocol.  It  checked  an 
incorrect  invariant  about  the  I/O  state.  It  was  not  caught  ear¬ 
lier  because  assertions  are  usually  disabled  on  the  hardware 
and  I/O  is  not  modeled  in  simulation. 

7  Imposing  Model  Descriptions 

We  also  studied  the  extent  to  which  manually  described 
models  can  be  inaccurate  either  due  to  translation  errors  or 
“drift.”  While  it  is  not  clear  which  results  in  more  errors,  it 
is  immaterial  since  the  end  result  is  the  same  -  bugs  may  be 
missed  if  a  model  is  specified  incorrectly.  We  use  to 
create  an  automatic  “checker”  that  looks  for  semantic  dif¬ 
ferences  between  a  model  and  the  matching  protocol  code. 
To  collect  data,  we  apply  this  jcg++  extension  to  the  model 
of  the  dyn-ptr  protocol  created  by  Park  and  Dill  and  the 
current  FLASH  protocol  code  [19].  This  data  will  give  us 


an  idea  of  how  faithful  manually  written  model  descriptions 
are  to  their  underlying  implementations. 

Rules  in  Munp  contain  a  precondition  that  guards  ac¬ 
tions.  We  extract  each  rule’s  actions  and  precondition  us¬ 
ing  a  modified  version  of  the  Murtp  front-end  parser.  In¬ 
cluded  in  this  process  is  converting  strongly  typed  Munp 
variables  to  C’s  weak  type  system.  To  translate  semantics 
from  the  model  to  the  protocol  code,  we  provide  a  table 
that  maps  each  model  variable  to  its  FLASH  equivalent. 
For  each  rule,  the  extension  uses  a  heuristic  on  the  name 
of  the  rule  to  determine  the  corresponding  FLASH  han¬ 
dler.  A  xg++  extension  uses  this  mapping  to  attempt  to 
match  the  actions  of  each  rule  to  those  in  its  handler.  It 
searches  for  a  path  in  the  FLASH  handler  that  will  satisfy 
the  rule’s  preconditions  by  observing  all  conditionals,  as¬ 
signments,  and  assertions.  For  example,  given  the  precon¬ 
dition  IDH.  Pending  &  DH.  Dirty,  it  searches  for  a  se¬ 
quence  that  implies  DH .  Pending  to  be  0  and  DH .  Dirty 
to  be  1.  This  can  be  inferred  by  tracking  “if”  statements,  as¬ 
sertions,  and  assignments  in  the  code.  If  no  such  mapping 
exists,  the  extension  emits  an  error  message. 

After  a  path  satisfying  the  precondition  has  been  found, 
the  extension  attempts  to  match  all  actions  associated  with 
that  rule  along  that  path.  The  manual  model  description 
is  simple  enough  that  there  are  only  four  types  of  actions 
to  find:  assignments,  assertions,  decrements,  and  message 
sends.  Assignments  and  decrements  can  be  transliterated 
from  Mur(fi  to  FLASH.  Note  that  conditionals  that  check 
that  the  variable  has  the  value  assigned  in  the  Mur(p  model 
also  implied  that  the  assignment  is  matched.  This  condition 
arises  when  the  model  omits  details.  Assertions  in  Mur^p  are 
simple  binary  boolean  operations  consisting  of  one  operator 
(equal,  not-equal)  and  two  operands.  They  can  be  matched 
by  either  an  assertion  in  the  implementation  or  a  conditional 
that  implies  that  they  are  true.  Message  sends  on  the  other 
hand  require  special  treatment  since  their  operations  can  be 
more  diffuse  in  the  FLASH  code.  For  example,  the  mes¬ 
sage  send  at  line  8  in  Figure  3  encompasses  three  separate 
statements  in  Figure  2.  For  a  send,  the  message  opcode  and 
outgoing  lane  arguments  are  checked  as  well  as  outgoing 
interface.  Note  that  the  extension  only  maps  elements  in 
the  model  onto  elements  in  the  protocol  code. 

Every  action  in  the  manual  model  description  was 
checked  against  the  implementation.  This  found  14  differ¬ 
ences  between  the  model  and  the  implementation.  These 
differences  fall  into  four  categories:  semantically  non¬ 
equivalent  code  rearrangement,  hard  errors  in  the  translation 
of  the  model,  semantically  equivalent  syntactic  differences, 
and  incidental  differences  resulting  from  modeling  a  subset 
of  the  implementation.  We  consider  the  first  two  categories 
to  be  translation  errors  that  could  hide  potential  bugs. 

There  were  two  cases  in  the  first  category.  These  con¬ 
sisted  of  cases  where  assertions  that  were  guarded  by  “if” 


statements  in  the  model  had  been  hoisted  past  the  corre¬ 
sponding  “if”  statements  in  the  protocol  code.  Guarding  the 
assertions  with  an  extraneous  conditional  made  the  manual 
model  description  weaker  than  the  implementation  since  the 
assertions  are  only  checked  on  that  path. 

There  were  two  errors  in  translation,  which  could  mask 
errors  in  the  model  itself.  In  one,  the  model  of  the  NILo- 
calGetXDelayed  handler  first  checks  that  is  legal  to 
assign  the  value  0  to  the  variable  DH.Real  before  mak¬ 
ing  the  assignment.  In  the  actual  protocol  implementation, 
RealPtrs  is  a  counter  for  the  number  of  sharers  on  the 
linked  list.  Thus,  setting  it  to  zero  is  a  violation  of  the  way 
this  counter  should  have  been  used.  The  other  error  involves 
the  assertion  shown  in  line  12  of  Figure  2  and  Figure  3.  This 
assertion  is  incorrect,  but  survived  verification  because  the 
manual  model  description  lacked  the  details  to  trigger  it. 

There  were  six  cases  where  implementation  code  was 
translated  to  semantically  equivalent  but  syntactically  dis¬ 
similar  model  code.  For  example,  in  the  NIInvalAckDe- 
layed,  the  protocol  decrements  the  counter  RealPtrs 
and  then  tests  for  equality  to  0.  The  model  tests  if 
RealPtrs  equals  1  and  then  decrements.  Since  handlers 
on  the  same  node  run  sequentially,  these  actions  are  equiv¬ 
alent. 

Finally,  there  were  four  incidental  differences.  In  one, 
the  model  indicates  that  an  INVALJ^CK  message  should 
be  sent,  but  the  implementation  sends  an  INVAL  message 
instead.  In  reality,  the  two  opcodes  have  the  same  under¬ 
lying  bit  encoding  so  they  are  equivalent  even  though  they 
are  syntactically  different.  Other  examples  arose  because 
the  model  only  partially  describes  the  protocol,  and  so  must 
make  assumptions  about  the  modeled  state. 

In  summary,  14  differences  were  found:  two  rearrange¬ 
ments,  one  translation  error  that  weakened  the  model,  one 
false  assertion  that  was  hidden  by  a  simplified  hardware 
model,  and  ten  incidental  differences.  These  differences  il¬ 
lustrate  the  problems  caused  by  manual  modeling  both  in 
its  initial  construction  and  in  its  modification  to  track  im¬ 
plementation  changes. 

8  Related  Work 

In  previous  work,  xg++  was  used  to  build  a  set  of  static 
checkers  for  both  the  FLASH  protocols  [7]  and  for  general 
systems  code  [10].  This  paper’s  use  of  model  checking  and 
slicing-based  model  construction  is  a  fundamentally  differ¬ 
ent  approach  to  finding  errors.  The  methods  of  both  papers 
are  largely  complementary.  The  errors  found  in  this  paper 
require  dynamic  information  and  can  catch  very  convoluted 
race  conditions.  In  contrast,  the  static  checkers  are  shal¬ 
lower,  but  more  light-weight  and  do  not  need  to  simulate 
any  protocol  code. 

We  could  have  potentially  used  other  open  compilers  to 


extract  models.  These  include  Lord’s  ctool  [16],  Crew’s 
ASTLOG  [9],  Shigeru  Chiba’s  Open  C++  [5,  6],  and  Sri- 
vastava  and  Eustace’s  ATOM  [20]  object-code  modifica¬ 
tion  system.  However,  it  appears  that  the  first  three  would 
have  required  extensive  retooling  to  support  the  analysis  we 
needed.  ATOM,  on  the  other  hand,  works  at  too  low  a  level 
for  our  purposes. 

There  is  one  published  example  of  model  checking  be¬ 
ing  used  on  an  implementation  of  a  cache  coherence  proto¬ 
col  [11].  In  this  case,  the  implementation  is  in  hardware. 
The  model  checking  technique  was  to  use  refinement  in  Ca¬ 
dence  SMV.  So  far  as  we  know,  no  one  else  has  been  able 
to  apply  this  verification  approach. 

There  have  only  been  a  few  systems  to  do  computer- 
assisted  model  extraction.  The  Bandera  system  is  a  so¬ 
phisticated  model  extractor  for  Java  programs  [8].  Ban¬ 
dera  has  two  methods  for  extraction.  The  first  is  a  program 
slicer  that  accepts  temporal  properties  as  slicing  criteria  and 
uses  sophisticated  static  analysis  algorithms  to  do  accurate 
slicing.  Effective  slicing  in  Java  requires  new  slicing  al¬ 
gorithms  for  multi-threaded  programs.  The  slicer  removes 
irrelevant  code  and  variables  that  could  otherwise  blow  up 
the  state  space  during  model  checking.  The  second  tech¬ 
nique  is  data  abstraction.  The  user  maps  data  types  to  a 
small  set  of  abstract  values.  Abstract  versions  of  operations 
applied  to  these  data  types  are  defined.  Since  the  number  of 
states  visited  by  a  model  checker  is  a  function  of  the  number 
of  distinct  values  each  variable  can  have,  this  also  has  the 
potential  for  greatly  reducing  the  state  space  during  model 
checking. 

Our  approach  is  more  pragmatic  than  Bandera’s.  Our 
method  permits  the  use  of  an  open-ended  collection  of  ad 
hoc  extraction  methods,  and  is  optimized  for  finding  bugs. 
It  would  be  difficult  to  imagine  handling  the  FLASH  proto¬ 
col  implementation  without  this  flexibility.  Bandera  has  not 
been  successfully  applied  to  examples  comparable  in  com¬ 
plexity  to  the  FLASH  protocols. 

The  SLAM  project  at  Microsoft  Research  extracts  a  pro¬ 
gram  with  only  boolean  variables  from  C  code  [2,  3].  These 
variables  represent  boolean  conditions  in  the  original  code. 
This  program  is  model  checked,  and  the  resulting  coun¬ 
terexamples  are  verified  using  symbolic  execution  and  deci¬ 
sion  procedures.  If  a  counterexample  is  found  to  be  a  false 
alarm,  constraints  are  added  to  the  boolean  program  to  im¬ 
prove  the  model.  The  goal  of  the  project  is  to  check  asser¬ 
tions  in  the  code.  In  contrast,  we  are  extracting  a  model, 
then  using  Mur(p  to  check  higher-level  properties  of  several 
instances  of  the  models  running  concurrently.  It  is  difficult 
to  imagine  solving  this  problem  with  SLAM  because  of  lim¬ 
itations  on  the  properties  it  can  check  and  the  scale  of  the 
model  checking  problem  they  would  have. 

An  approach  that  is  similar  to  ours  in  philosophy  was 
used  to  check  Lucent’s  PathStar  system  [14].  A  “control 


skeleton,”  which  consists  of  only  the  control  constructs  of  a 
system,  was  extracted  using  a  simplified  C  parser,  and  then 
selected  constructs  (such  as  message  sends  and  receives) 
were  extracted  from  the  original  source  using  a  collection 
of  pattern  matching  rules.  The  result  was  checked  using 
the  SPIN  model  checker,  which  is  an  explicit  state  model 
checker  somewhat  like  Murcp. 

An  alternative  approach  to  ours  is  the  Teapot  system, 
which  is  a  programming  environment  for  software  im¬ 
plementations  of  multiprocessor  cache  coherence  proto¬ 
cols  [4].  Teapot  couples  a  domain-specification  language 
for  writing  cache  coherence  protocols  with  Munp,  which  is 
used  to  verify  the  protocols.  The  protocols  are  automati¬ 
cally  translated  to  C  after  verification.  Program  generation, 
as  in  Teapot,  is  a  good  approach  when  applicable.  How¬ 
ever,  it  relies  on  the  availability  of  adequate  compilation 
techniques  for  the  highly  specialized  hardware  used  in  the 
multiprocessor  interconnect.  There  are  numerous  examples 
in  the  FLASH  protocol  where  hand  optimization  was  nec¬ 
essary  because  the  compiler  was  inadequate.  The  customiz¬ 
able  extraction  methods  described  in  this  paper  can  be  ap¬ 
plied  in  the  majority  of  cases  when  program  generation  is 
impractical. 

9  Conclusion 

We  have  demonstrated  a  simple  approach  to  automati¬ 
cally  extracting  models  from  protocol  code.  Our  method 
both  reduces  the  effort  of  using  model  checking  and  makes 
it  more  effective  by  ensuring  that  the  extracted  model  is 
more  faithful  to  the  original  protocol  code.  We  were  able 
to  apply  model  checking  to  four  protocols  in  less  time  than 
it  took  to  manually  verify  just  one.  One  of  the  great  bene¬ 
fits  is  that  the  amount  of  manual  labor  required  is  reduced 
by  a  significant  amount.  In  addition,  our  models  are  more 
complete  and  found  errors  that  eluded  the  manual  verifica¬ 
tion  process.  The  automatic  nature  of  the  extraction  also 
reduces  the  problem  of  drift,  ensuring  that  the  model  that  is 
checked  closely  tracks  the  underlying  implementation.  In 
total,  our  method  found  eight  protocol  bugs  that  were  not 
found  by  the  manual  verification.  We  also  found  four  dis¬ 
crepancies  between  the  manually  described  model  and  the 
implementation  that  may  have  accounted  for  some  of  the 
missed  bugs.  Our  method,  though  automatic,  does  not  im¬ 
pact  the  state  space  of  the  model  created. 

The  core  of  our  approach  is  the  use  of  an  extensible 
compiler.  Compilers  understand  code  at  a  programming 
language  level.  We  leverage  this  understanding  to  build  a 
model  from  the  implementation  code.  This  is  accomplished 
through  two  facilities  provided  to  us  by  xg++.  One  is  the 
metal  slicer,  which  is  used  to  select  features  in  the  imple¬ 
mentation  to  extract.  The  other  is  the  metal  printer,  which 
allows  the  user  both  to  specify  additional  checks  to  tighten 


the  criteria  for  correctness  and  to  specify  rules  for  recog¬ 
nizing  opportunities  to  perform  abstraction.  In  combina¬ 
tion  with  a  model  checker  that  takes  imperative  language 
input  such  as  Munp,  models  can  be  quickly  and  easily  con¬ 
structed.  The  benefit  here  is  that  a  greater  amount  of  a  sys¬ 
tem  can  be  checked  by  extracting  many  orthogonal  models 
and  checking  each  separately.  While  the  method  is  not  fully 
automatic,  some  of  the  verification  tasks  which  are  both  te¬ 
dious  and  error  prone  have  been  automated. 

We  feel  that  this  method  is  applicable  to  a  range  of  prob¬ 
lems  encountered  while  debugging  and  verifying  low  level 
systems.  It  seems  particularly  effective  on  code  found  on 
embedded  applications  where  the  code  is  easily  analyzed 
by  tools  but  difficult  for  humans  to  read. 
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